Statistical inference, part I

Eva Freyhult

NBIS, SciLifeLab

April 23, 2024

Introduction to hypothesis tests

Statistical inference is to draw conclusions regarding properties of a population based on observations of a random sample from the population.

A hypothesis test is a type of inference about evaluating if a hypothesis about a population is supported by the observations of a random sample (i.e by the data available).

Typically, the hypotheses that are tested are assumptions about properties of a population, such as proportion, mean, mean difference, variance etc.

The null and alternative hypothesis

Null hypothesis, \(H_0\)

\(H_0\) is in general neutral

  • no change
  • no difference between groups
  • no association

In general we want to show that \(H_0\) is false.

Alternative hypothesis, \(H_1\)

\(H_1\) expresses what the researcher is interested in

  • the treatment has an effect
  • there is a difference between groups
  • there is an association

The alternative hypothesis can also be directional

  • the treatment has a positive effect

To perform a hypothesis test

  1. Define \(H_0\) and \(H_1\)
  2. Select an appropriate significance level, \(\alpha\)
  1. Select appropriate test statistic, \(T\), and compute the observed value, \(t_{obs}\)
  1. Assume that the \(H_0\) is true and compute the sampling distribution of \(T\).
  1. Compare the observed value, \(t_{obs}\), with the computed sampling distribution under \(H_0\) and compute a p-value. The p-value is the probability of observing a value at least as extreme as the observed value, if \(H_0\) is true.
  2. Based on the p-value either accept or reject \(H_0\).

Null distribution

A sampling distribution is the distribution of a sample statistic. The sampling distribution can be obtained by drawing a large number of samples from a specific population.

The null distribution is a sampling distribution when the null hypothesis is true.

p-value

The p-value is the probability of the observed value, or something more extreme, if the null hypothesis is true.

p-value

The p-value is the probability of the observed value, or something more extreme, if the null hypothesis is true.

Error types

A hypothesis test is used to draw inference about a population based on a random sample. The inference made might of course be wrong. There are two types of errors;

Type I error is a false positive, a false alarm that occurs when \(H_0\) is rejected when it is actually true. Examples: “The test says that you are covid-19 positive, when you actually are not”, “The test says that the drug has a positive effect on patient symptoms, but it actually has not”.

Type II error is a false negative, a miss that occurs when \(H_0\) is accepted, when it is actually false. Examples: “The test says that you are covid-19 negative, when you actually have covid-19”, “The test says that the drug has no effect on patient symptoms, when it actually has”.

Probability of type I and II errors

The probability of type I and II errors are denoted \(\alpha\) and \(\beta\), respectively.

\[\alpha = P(\textrm{type I error}) = P(\textrm{false alarm}) = P(\textrm{Reject }H_0|H_0 \textrm{ is true})\] \[\beta = P(\textrm{type II error}) = P(\textrm{miss}) = P(\textrm{Accept }H_0|H_1 \textrm{ is true})\]

The significance level, \(\alpha\), is the risk of false alarm.

Probability of type I and II errors

Figure 1: The probability density functions under H0 and H1, respectively. The probability of type I error (\(\alpha\)) and type II error (\(\beta\)) are indicated.

Significance level

The significance level, \(\alpha\), is the risk of false alarm.

\(\alpha\) should be set before the hypothesis test is performed.

Common values to use are \(\alpha=0.05\) or 0.01.

  • If the p-value is above the significance level, \(p>\alpha\), \(H_0\) is accepted.
  • If the p-value is below the significance level, \(p \leq \alpha\), \(H_0\) is rejected.

Statistical power

Statistical power is defined as

\[\textrm{power} = 1 - \beta = P(\textrm{Reject }H_0 | H_1\textrm{ is true}).\]

Perform a hypothesis test

You suspect that a dice is loaded, i.e. showing ‘six’ more often than expected of a fair dice. To test this you throw the dice 10 times and count the total number of sixes. You got 5 sixes. Is there reason to believe that the dice is loaded?

  1. Define \(H_0\) and \(H_1\)
  2. Select an appropriate significance level, \(\alpha\)
  3. Select appropriate test statistic, \(T\), and \(t_{obs}\)
  4. Compute the sampling distribution of \(T\) when \(H_0\) is true
  5. Compare the observed value, \(t_{obs}\), with the computed sampling distribution under \(H_0\) and compute a p-value. The p-value is the probability of observing a value at least as extreme as the observed value, if \(H_0\) is true.
  6. Based on the p-value either accept or reject \(H_0\).

Hypothesis testing using resampling

The null distribution, the sampling distribution of a test statistic under the null hypothesis, is sometimes known or can be approximated. When the null distribution is unknown, another option is to estimate the null distribution using resampling.

  • Simulate from a known \(H_0\)
  • Bootstrap
  • Permutation

Bootstrap

Bootstrap is a resampling technique that resamples with replacement from the available data (random sample) to construct new simulated samples.

Bootstrapping can be used to simulate a sampling distribution and estimate properties such as standard error and interval estimates, but also to perform hypothesis testing.

Bootstrap example

Men with a waist circumference greater than 94 cm have been shown to have an increased risk of cardiovascular disease. Based on the following waist circumferences of 12 diabetic men, is there reason to believe that the mean waist circumference of diabetic men is greater than 94 cm?

Hypotheses

\[H_0: \mu=94\] \[H_1: \mu>94\]

Significance level

let \(\alpha = 0.05\).

Test statistic

use the sample mean, \(m\), as test statistic.

\(mobs = 102.33\)

Bootstrap example

Null distribution

Assume \(H_0\) is true and compute sampling distribution. This can be done by first modifying the observed values;

\(xnull = x - mobs + 94\)

Bootstrap!

–>

Bootstrap example

Null distribution

p-value

Probability of \(mobs\) or something more extreme; \(p = 0.01\)

Accept or reject \(H_0\)?

Permutation

A permutation test answers the question whether an observed effect could be due only to the random sampling, how samples are assigned to different groups for example.

Permutation tests are commonly used in clinical settings where a treatment group is compared to a control group.

Permute!

perm mA mB diff
0 24 22 2.57
1 23 23 -0.26
2 22 24 -1.92
3 24 22 1.30
4 24 22 1.11
5 19 27 -7.28

Permutation example

Do high fat diet lead to increased body weight?

Study setup:

  1. Order 24 female mice from a lab.

Permutation example

Do high fat diet lead to increased body weight in mice?

Study setup:

  1. Order 24 female mice from a lab.

  2. Randomly assign 12 of the 24 mice to receive high-fat diet, the remaining 12 are controls (ordinary diet).

High-fat diet

Ordinary diet

Permutation example

Do high fat diet lead to increased body weight in mice?

Study setup:

  1. Order 24 female mice from a lab.

  2. Randomly assign 12 of the 24 mice to receive high-fat diet, the remaining 12 are controls (ordinary diet).

High-fat diet

Ordinary diet

  1. Measure body weight after three weeks.

Permutation example

Do high fat diet lead to increased body weight in mice?

Study setup:

  1. Order 24 female mice from a lab.

  2. Randomly assign 12 of the 24 mice to receive high-fat diet, the remaining 12 are controls (ordinary diet).

  3. Measure body weight after three weeks.

The observed values, mouse weights in grams, are summarized below;

high-fat 25 30 23 18 31 24 39 26 36 29 23 32
ordinary 27 25 22 23 25 37 24 26 21 26 30 24

Simulation example

1. Null and alternative hypotheses

\[ \begin{aligned} H_0: \mu_2 = \mu_1 \iff \mu_2 - \mu_1 = 0\\ H_1: \mu_2>\mu_1 \iff \mu_2-\mu_1 > 0 \end{aligned} \]

where \(\mu_2\) is the (unknown) mean body weight of the high-fat mouse population and \(\mu_1\) is the mean body-weight of the control mouse population.

Studied population: Female mice that can be ordered from a lab.

Simulation example

2. Select appropriate significance level \(\alpha\)

\[\alpha = 0.05\]

Simulation example

3. Test statistic

Of interest; the mean weight difference between high-fat and control mice

\[D = \bar X_2 - \bar X_1\]

Mean weight of 12 (randomly selected) mice on ordinary diet, \(\bar X_1\). \(E[\bar X_1] = E[X_1] = \mu_1\)

Mean weight of 12 (randomly selected) mice on high-fat diet, \(\bar X_2\). \(E[\bar X_2] = E[X_2] = \mu_2\)

Observed values;

\(\bar x_1 = 25.83\), mean weight of control mice (ordinary diet)

\(\bar x_2 = 28.00\), mean weight of mice on high-fat diet

\(d_{obs} = \bar x_2 - \bar x_1 = 2.17\), difference in mean weights

Simulation example

4. Null distribution

If high-fat diet has no effect, i.e. if \(H_0\) was true, the result would be as if all mice were given the same diet.

The 24 mice were initially from the same population, depending on how the mice are randomly assigned to high-fat and normal group, the mean weights would differ, even if the two groups were treated the same.

Simulation example

4. Null distribution

If high-fat diet has no effect, i.e. if \(H_0\) was true, the result would be as if all mice were given the same diet.

The 24 mice were initially from the same population, depending on how the mice are randomly assigned to high-fat and normal group, the mean weights would differ, even if the two groups were treated the same.

Simulation example

4. Null distribution

Random reassignment to two groups can be accomplished using permutation.

Assume \(H_0\) is true, i.e. assume all mice are equivalent and

  1. Randomly reassign 12 of the 24 mice to ‘high-fat’ and the remaining 12 to ‘control’.
  2. Compute difference in mean weights

If we repeat 1-2 many times we get the sampling distribution when \(H_0\) is true, the so called null distribution, of difference in mean weights.

Simulation example

4. Null distribution

Simulation example

5. Compute p-value

What is the probability to get an at least as extreme mean difference as our observed value, \(d_{obs}\), if \(H_0\) was true?

\(P(\bar X_2 - \bar X_2 \geq d_{obs} | H_0) =\) 0.169

Simulation example

6. Conclusion?